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Abstract 

We show that algebraic varieties with maximum likelihood degree one are exactly the 
images of reduced A-discriminantal varieties under monomial maps with finite fibers. 
The maximum likelihood estimator corresponding to such a variety is Kapranov's Horn 
uniformization. This extends Kapranov's characterization of A-discriminantal hyper- 
surfaces to varieties of arbitrary codimension. 

1. Main results 

1.1 

Let X be a closed and irreducible subvariety of the algebraic torus 

m 

( C *r = {p = (pi, • • • ,p m ) g c m i y\pi o}. 

i=l 

If u = (ui, . . . , u m ) is a set of integers, then the likelihood function of X is defined to be 

m 

L = L(p,u)=l[p^ :X ^C*. 
i=l 

One is often interested in a statistical model X contained in the hyperplane { Y^LiPi = 1}j an< ^ 
in real critical points of the likelihood function corresponding to positive integer data u. One of 
the critical points will provide parameters p which best explain the observation u. 

We refer to [CHKS06, HKS05, IPS05] for an introduction to the problem of maximum likeli- 
hood estimation in the setting of algebraic statistics. For the study of critical points of L from a 
more geometric point of view, see |Dam99l IDamOOl IFKOOl IHuhl21 IOT951 ISilMl IVar95] . 

Write X sm for the set of smooth points of X. 

Definition 1. The maximum likelihood degree of X C (C*) m is the number of critical points of 
L(p, u) on X sm for sufficiently general u. 

It will become clear in Section [3] that this number is finite and well-defined. We consider the 
following problem posed in [HKS05| Problem 14] and [Stu09j Section 3]. 

Problem. Find a geometric characterization of varieties with maximum likelihood degree one. 

Theorems [2] and [5] below show that the class of varieties in question is essentially the class 
of ^4-discriminantal varieties in the sense of Gelfand, Kapranov, and Zelevinsky [GKZ94]. In 
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particular, there are only countably many subvarieties of (C*) m whose maximum likelihood 
degree is one, one for each integral matrix with m columns whose column sums are zero, up to 
scaling of coordinates p. 

Theorem 2. A subvariety of (C*) m has maximum likelihood degree one if and only if it admits 
Kapranov's Horn uniformization. More precisely, the following are equivalent: 

(i) X C (C*) m has maximum likelihood degree one. 

(ii) There is a vector of nonzero complex constants d = (d%, . . . , d m ), a positive integer n, and 
an integral matrix 

bn • • • bim 
B= : : 

bnl ' ' ' b nm 

whose column sums are zero, such that the rational map 

vet : p— i (C*) m , • - • , u m ) — ► (*!, . . . , tf m ), 

maps p m_1 dominantly to X, where 

n m , 

^ fc (m, . . . , u m ) = d k ]~[ ( kjUj) , l^k^m. 
i=l j=l 

Here we agree that zero to the power of zero is one. 
In this case, X C (C*) m uniquely determines, and is determined by, ^f. 

The rational functions are homogeneous of degree zero in the variables u, because column 
sums of B are assumed to be zero. The rational map ^ is the likelihood estimator of X which 
maps the data vector u to the unique critical point of the corresponding likelihood function 
L(p,u). 

The proof of Theorem [2] closely follows Kapranov's presentation of Horn's ideas from 1889 
[Hor89] . As Kapranov remarks in Ka p91| , the present paper could have been written a hundred 
years ago. 



1.2 

Theorem [2] shows that the set of all varieties with maximum likelihood degree one is partially 
ordered by taking images under monomial maps with finite fibers. To be more precise, let B be 
an n x m integral matrix as in Theorem [21 and let C be an m x I integral matrix with linearly 
independent rows. Consider the homomorphism 

m m 



and the linear projection 



,Pr. 



i=l 



i=l 



4>c : P'" 1 — > P™- 1 , v = (v 1 ,...,v l )^Cv:=(J2 • • • , E ' 

3=1 3=1 

In the same notation, the Horn uniformization ^ of Theorem [2] can be written 

*(u) = do(Bu) B , 
where o is the Hadamard product, the entrywise multiplication. 
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Lemma 3. For veC ! and r, d € (C*) m , we have 

B{Cv) = (BC)v, (r B f=r < > BC \ (d o rf = d c o r c . 
The same rules continue to hold if v, r, d are replaced by matrices of appropriate sizes. 

It follows that there is a commutative diagram 

(C*) m 



-l *. 



0C 1 



r ;,-(C*) 



4F 



where is the Horn uniformization associated to d c and BC: 



tf'(v) = d c o (BCv) 



\BC 



Since 4>c is dominant and (jp is proper, we have 



lC 



^im(^)^ = im(* / ) 



Corollary 4. If X C (C*) m is a closed subvariety with maximum likelihood degree one, then 
(j> (X) C (C*) is a closed subvariety with maximum likelihood degree one. 

Note that it is necessary to assume that C has rank m in order to ensure that <p {X) C (C*)' 
is closed and has maximum likelihood degree one. Note also that the maximum likelihood degrees 
of X and <p c (X) are different in general, even if C has rank m. See Example [9l 

1.3 

Define a partial order on the set of all varieties with maximum likelihood degree one by 

x c (c*y m ) y[x'c (c*) ? ) ^— > 

^there is an m x I integral matrix C of rank m such that (jF (X) = X' 

The maximal elements of this partially ordered set are precisely the reduced A-discriminantal 
varieties of [GKZ941 Chapter 9], up to scaling of coordintates. 

Theorem 5. The following are equivalent: 

(i) X C (C*) m has maximum likelihood degree one. 

(ii) There is a vector of nonzero complex constants d = (di, ■ ■ ■ , d m ), positive integers n and k, 
an integral matrix 

1 ... i 

«21 • • • 0,2n 

A = 

a kl • • • &kn 

whose columns generate Z fc , and an integral matrix of rank n — k 

b\\ ■ ■ ■ bi rn 



B 
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with AB = 0, such that the monomial map 

n n 

do(f> B : (e) n (e ^ q = . . . , g„) _> d o q B := [] , , d m J] 

t=l i=l 

maps the A-discriminantal variety Va D (C*) n dominantly to X . 
In this case, d o (f> B factors through a monomial map with finite hbers 

T(ker(A)) — > (C*) m , 
which maps the reduced A-discriminantal variety in T(ker(A)) birationally onto X. 

Here T(ker(.A)) := Hom(ker(A), C*) is the algebraic torus whose character lattice is ker(^4). 
If columns of B form a basis of ker(^4), then X is the reduced j4-discriminantal variety, up to 
scaling of coordinates by d. 

Our basic reference on A-discriminants will be |C?K Z94] . The definition and basic properties 
of >l-discriminantal variety and reduced j4-discriminantal variety will be recalled in Section f3. 61 



2. Examples and remarks 



Example 6. A point {p} G (C*) m has maximum likelihood degree one. The corresponding Horn 
uniformization is the constant map 



$ : pm-1 ( C *)™ uM do(Bu) B , 

where 

1 ... i 
-1 ■■■ -1 

The choice of d and B is in general not unique. For example, without changing ^ one may take 

1 ... i 



and 



B 



27 



and 



B 



Consequently, the choice of A in Theorem [5] is not unique. 

Example 7. Consider two binary random variables, and write p = (poo,Poi,Pio,Pn) for the 
joint probabilities corresponding to four possible outcomes. The case when the two events are 
independent can be modeled by the algebraic variety 

X = jp | PooPn - PoiPio = 0, poo + Poi + Pio + Pn = 1 j Q (C*) 4 . 

X has maximum likelihood degree one, and the likelihood function of X corresponding to a given 
data vector u = (uqq, uqi, Uio, tin) is maximized at its unique critical point 



*(u) 



uo+u+o Uq + U +1 Ui + U +0 Ui + U +1 



II 



++ 



in 



where 



u 0+ 




Ul+ 




u ++ 




u+o 




_ U+l 





uoo + U i 
UlQ + till 
uoo + Uoi + UlQ + Ull 
■uoo + UlQ 

Uoi + Un 
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The critical point ^(u) provides parameters p which best explains u. Note that \& is the Horn 
uniformization 

$ : p 3 — ► (C*) 4 , u^do(5u) B , 

with 

1 1 



d = (4,4,4,4) 



and 



B 




-2 
1 





-2 

1 



We check that X is the image of an ^4-discriminant under a monomial map. Choose an integral 
matrix A with AB = as in Theorem [5j For example, one may take 

11111 
12 2 



.4 



We index the columns of A by the variables q = (qo+,qi+, q++, q+o, q+i), and consider the space 
of all polynomials in t of the form 

F (t) = (qo+ + qi+) + q++-t + (q +0 + q+i) ■ t 2 . 

By definition, the ^4-discriminant is the closure of the set of all such F with a double root 

V A = {q E C 5 | q\ + - 4(q 0+ + qi+){q+0 + q+l) = o} C C 5 . 
The monomial map 

B ( 4 1o+q+o 4g +9+i ^qi+q+o ^qi+q+i \ 



do^ : (C*) 



* ~i5 



(C 



*\4 



d o q 



maps the A-discriminant Va H (C*) dominantly to X. 



Example 8. Decomposable graphical models form an interesting class of varieties with maximum 
likelihood degree one. An explicit rational expression of the maximum likelihood estimator ^ is 
known in this case |Lau961 Chapter 4.4]. We invite the reader to check that this ^ indeed is a 
Horn uniformization. 

Example 9. In general, the maximum likelihood degree of a variety is different from that of its 
image under a finite monomial map. For example, the curve 

{p!+pI = i}c(c*) 2 

has maximum likelihood degree 4, but its image under the monomial map 

(C *) 2 ^(C*) 2 , ( P i,P2)^(pIp 2 2 ) 
has maximum likelihood degree 1. 

Remark 10. Suppose X C (C*) m has maximum likelihood degree one. Then the tropicalization 
of X can be computed from the Bergman fan of the matroid defined by the matrix B of Theorem 
EJ See [DFS071 Section 3]. 

Remark 11. Let p m_1 be the projective space with homogeneous coordinates pi, . . . ,p m . In 
[HKS05, Stu09], the maximum likelihood degree is defined for a closed subvariety X of 

H:= {p= { P l,...,p m ) eP™' 1 \ P1---Pm(pi + ---+Pm) /0}. 
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If u = (u\, . . . ,u m ) is a given data vector, then the corresponding likelihood function of X is 
defined to be 

L(p,u) = n f ™ - : X — > C*. 

(pi + ■ ■ ■ + Pm) Ul+ - +Um 

We note that this setting is compatible with ours. Indeed, H can be viewed as the hyperplane 
{ ^2i=iPi = — (C*) m by the closed embedding 

fc: ir—>(c*r, p— ► ( ^ ^ V 

y ?>i h hp m pi h h p m y 

The two definitions of the likelihood function of X agrees under the pullback by 1. 

Remark 12. Suppose X is a hypersurface of (C*) m . For a smooth point x of X, let j x be the 
derivative of the inclusion followed by that of the left-translation o — 

7x : T X X — > T x (C*) m — > := r!(C*) m . 

This defines the logarithmic Gauss map to the space of hyperplanes of the Lie algebra 

7 :X-*P(g v ), x^im( lx ). 

X has maximum likelihood degree one if and only if 7 is birational, because the set of critical 
points of the likelihood function of X corresponding to u = (u\, . . . ,u m ) is the fiber of 7 over 
the point 



m 

E 

i=i 



m ■ dlogfe) G H°((C*r,fi[ c * )ro ) ~ g v . 



See Section [3] for more details. 

Kapranov states in |Kap91 Theorem 1.3] that 

(i) if X is a reduced ^4-discriminantal hypersurface, then 7 is birational, and 

(ii) if 7 is birational, then X is a reduced A-discriminantal hypersurface, up to an automorphism 
of the ambient torus. 

As pointed out in |CD07l Section 2], a small correction needs to be made on the statement ([n|). 
If 7 is birational, then there is a monomial map with finite fibers 

T(ker(A)) — > (C*) m , 

which maps the reduced A-discriminantal variety in Tfker(A)) birationally onto X. 

Remark 13. If X C (C*) m is smooth of dimension d, then the maximum likelihood degree of X 
is the signed Euler-Poincare characteristic (— l) d x(X). See [FK00| lHuh!2| . 

Gabber and Loeser shows in |GL96[ Theoreme 8.2] that a perverse sheaf is irreducible and has 
Euler-Poincare characteristic one if and only if it is hypergeometric. It would be interesting to 
understand the relation between this result and that of the present paper. See also [LS91, LS92J. 



3. Proofs 

We closely follow [GKZ94 JHuhl21 Kap91|. Arguments will be reproduced as needed, for the sake 
of self-containedness. 
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3.1 

Let X C (C*) m be a closed and irreducible subvariety of dimension d. We write the closed 
embedding by 

For a smooth point x of X, let 7^ be the derivative of ip followed by that of the left-translation 

ip(x)^ 1 o — 

lx : T X X — ► T^ x) {£*) m — > Ti(C*) m . 
In local coordinates, 7^ is represented by the logarithmic jacobian matrix 

( dlogipi \ . 

This defines the logarithmic Gauss map to the Grassmannian of the Lie algebra g of (C*) m 

7 : X — ■> Gr(d,fl), x — Mm^). 
When X is a hypersurface, 7 agrees with the logarithmic Gauss map of [GKZ94, Section 9.3]: 

7 :X— > Gr(m-l,g)=P(g v ). 

3.2 

We write p = (pi, . . . ,p m ) for the coordinate functions of (C*) m as before. This defines a basis 
of the dual g v ~ C m corresponding to differential forms 

dio g (pi), . . . , dio g ( Pm ) g f° ((c*) m , nj c , )m y 

Hereafter we fix this choice of basis of g v , and identify g v with the space of data vectors u. 
Consider the vector bundle homomorphism defined by the pullback of differential forms 

m 

7 V : X sm x g v — > ^x sin , (x,u) 1 — > ^ Ui ■ dlog(^)(x), u = {u x , . . . , u m ). 

i=i 

The induced linear map 7^ between the fibers over a smooth point x is dual to the injective 
linear map of the previous subsection 

lx : T X X — ► g. 
Therefore 7 V is surjective and ker(7 v ) is a vector bundle. 

Definition 14. The variety of critical points of X C (C*) m is defined to be the closure 

X := P(ker( 7 v )) C X x P(g v ). 

Note that ker(7 v ) is a vector bundle of rank m — d. Therefore X is irreducible and 

dimX = m — 1. 

If u is integral and x is a smooth point of X, then (x, u) is in X if and only if x is a critical 
point of the likelihood function of X corresponding to u. Since dimX = dimP(g v ), the maximum 
likelihood degree of X is finite and well-defined. 
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3.3 

Write p m_1 for the projective space P(g v ) with the homogeneous coordinates u. Let $ be a 
rational map 

\$ . pm-l (C*) m , * = (*!,..., * m ). 

Each component of Vl/ should be a homogeneous rational function of degree zero in the variables 
u. We have Euler's relation 

v- 51og*j 

8=1 

The following lemma will play a central role in the proof of Theorem [2j 

Lemma 15. Suppose that the closure of the image of Vl/ is X. Then the following conditions are 
equivalent. 

(i) X is the closure of the graph of ^ . 

(ii) The graph of Vl/ is contained in X. 
(Hi) We have 

i=l ■? 



(iy) We have 

d log *j d log 



1 ^ i ^ m, 1 ^ j ^ m. 



duj dui 

Proof. Since X is irreducible of dimension m — 1, <JTJ) and (|n|) are equivalent. We prove that (jn]) 
and (|m|) are equivalent. 

By generic smoothness, for a sufficiently general u € g v , 

1. ^(u) is a smooth point of X, and 

2. VI/|;y : U — >■ X is a submersion for a small neighborhood U of u in P m_1 . 

Note from the construction of X that the graph of VP is contained in X if and only if such u is 
contained in the kernel of J^r u y Dually, this condition is satisfied if and only if the hyperplane 
of q defined by u contains the image of 

7*(u) : T^(u) x — ► 0- 
We express this last condition in terms of equations. 

Fix a sufficiently general u € g v as above. The key player is the linear mapping 

<K:g v ^g, 

defined as the composition 

g V * T u g v — > r u P m - x — > T nu) (C*) m — > g. 

The first is the derivative of the quotient map defining P m_1 , the second is the derivative of VI/, 
and the last is the derivative of the left-translation ^(u) -1 o — . In coordinates, <3? is represented 
by the logarithmic jacobian matrix 

du\ du m 

d log 9m . . . d log 9jn 

du\ du m 
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By the genericity assumption on u made above, the columns of the logarithmic jacobian matrix 
generate the image of 7<p( u ) in g. Therefore the image of 7$( u ) is contained in the hyperplane 
defined by u if and only if 

9 log 9m 



dui 



L dUm 



dui 



d log \g r . 

dll m 



II,, 



0. 



This proves the equivalence of (|TTJ) and (jmj) . 
Now suppose that (jni|) holds. Then 



9 log 



, fc=i 



i=l 



and hence 



d log ^ 3 d 



dm 



dui duj 



, fc=i 



d d 
duj duj 



, fc=i 



aiog^ 
duj 



□ 



Therefore (flu]) implies (|rv|) . Lastly, (flu]) is obtained from Euler's relation and (JTv] 
3.4 

We continue to assume that \P is a rational function from p m_1 to (C*) m whose components are 
homogeneous of degree zero in the variables u. The following statement can be found in |Kap91 
Proposition 3.1], where Kapranov attributes the result to Horn |Hor89] . 

Lemma 16. The following conditions are equivalent, 
(i) We have 

dlog^i _ dlog^fj 



duj dui 



(ii) There is a vector of nonzero constants d = (d\ , 
matrix 

' hi 

B 



1 ^ i ^ m, 1 ^ j ^ m. 

,d m ), a positive integer n, and an integral 

bim 



Ml 



whose column sums are zero, such that 



n m 



1 < k < m. 



i=i j=i 

Here we agree that zero to the power of zero is one. 

Proof that @) implies ^j. We employ the notation introduced in Lemma [3l Use unique factor- 
ization in C[ui, . . . , u m ] to write 

* = f B , 

where 

1. f = (/i, . . . , /„) is a vector of irreducible homogeneous polynomials of degrees (5\, . . . , S n ) in 
the variables u, and 
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2. B is an n x m integral matrix such that 



[Si, ...,5 n ] 



hi 



0. 



b-nl ' ' ' b nm 

We may assume that fi and fj are relatively prime to each other for i ^ j. Now Q reads 

Since the polynomial inside the parenthesis has degree one less than fk, which is relatively prime 
to all the other components of f , we have 

, dfk df k 
Okii^ — = U. 

OUj OUi 

Therefore there are homogeneous polynomials gk in u such that 

dfk , 

t; — = t>ki ■ gk- 

OUi 



Now use Euler's relation to note that 



Skfk = ^2 1 



i=l 



'" dm 



hm \gk- 



i=l 



Since fk are assumed to be irreducible, gk should be nonzero constants. This shows that 

f = e o Bu 

for a vector of nonzero constants e = (ei, . . . , e n ). The proof is completed by setting 

d = e B . 



□ 



3.5 

Proof of Theorem [H Suppose that X has maximum likelihood degree one. Let pr 1 and pr 2 be 
the projections 



X 




The assumption made on X is equivalent to the statement that pr 2 is a birational morphism. 
Let pr^ 1 be the rational inverse of pr 2 , and define 

^ ^p^opr^ 1 iP™- 1 -~ > (C*) m 

Since the graph of ^ is contained in X, Lemma [T5l and Lemma [TBI prove what we want. 
Conversely, suppose that ^ is a rational map of the form 

* = do(Bu) B , 

which maps dominantly to X. By Lemma [T5l and Lemma \16\ X is the closure of the graph of 
This shows that pr 2 is a birational morphism. 
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The above argument also shows that X C (C*) m uniquely determines, and is determined by, 
the rational map □ 

3.6 

Before proceeding to the proof of Theorem [5j we recall the definition and basic properties of 
j4-discriminantal varieties and reduced A-discriminantal varieties, following [GKZ94, Chapter 9]. 
Some notations are adjusted for the internal consistency of the present paper. 
Let A be an integral matrix of the form 



A 



1 

a>2i 



1 

0>kn 



whose columns generate Z fc . Write {u)\, ... ,u) n } for the set of column vectors of A, and consider 
the affine space C n of Laurent polynomials of the form 

n 

F(t) = Y,Qi-t u> ', q=(q u ...,q n )eC n , t = (t u . . . ,t k ). 

i=l 

Definition 17. The A-discriminantal variety V a is the closure of the set 

= |F e C n I {F = 0} has a singular point in (C*) fc J C C n . 

The projective dual of P(V^) C p n_1 is the toric variety Xa Q P n , defined as the closure 
of the image of the monomial map 



(C 



*\k 



T>n— 1 



(t t " 1 ,...,t Wn ). 



Let 8$ be an integral matrix whose columns form a basis of ker A In other words, SS is a 
Gale dual of A. We have exact sequences 



0- 



kei(A) ~ Z 



n—k 



■I/' 







and 







(C 



*\k 



(C*) n 



(C*) n - k ~ T( ker(A)) ^ 0. 

Note that is invariant under the action of (C*) fc . 

Definition 18. The reduced A-discriminantal variety V^4 is the image of Vyin(C*) n in T(ker A). 

Reduced A-discriminantal varieties admit a Horn uniformization [GKZ94, Theorem 9.3.3]: 
Theorem 19. Let 3? be the Horn uniformization 

Then the closure of the image of is the reduced A-discriminantal variety V^- 



3.7 

Proof of Theorem Suppose that X has maximum likelihood degree one. Then, by Theorem 
[2J there is a set of nonzero constants d = (d\, . . . , d m ) and an n x m integral matrix B whose 
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column sums are zero such that the Horn uniformization 



$ : pm-1 + (C*) m , U^do (Buf 

maps dominantly to X. Write n — k for the rank of B, and consider the largest subgroup IP of 
rank n — k containing all the columns of B. Let 8$ be a matrix whose columns form a basis of 
this subgroup. Let A and C be integral matrices such that 

1. AB = 0, 

2. B = mc, 

3. the first row of A is (1, . . . , 1), and 

4. the top row of the diagram below is exact: 







-k & 



IP- 







c 



Let & be the Horn uniformization 

sn» . mn—k—1 



(C 



In the notation introduced in Section 11.21 we have a commutative diagram 

J.3B 

(C*) n— - (C*) ra 



pn— fc— 1 _ * 
A 



jm-1 




By Theorem 1191 the commutative diagram restricts to that of dominant mappings 

-v A - — v A n(c*) n 



yn—k—1 
A 



jm-1 



This proves that |[|) implies (|n|). Commutativity of the above diagrams also show that the mono- 
mial map with finite fibers 

docf : (C*) n - k — > (C*) m 
restricts to a birational isomorphism 

Indeed, by Lemma [151 a fiber of \P over a general point of X is connected. 

Conversely, suppose that X satisfies the condition (jn]). Theorem 1191 and Theorem [2] show that 
a reduced ^4-discriminantal variety has maximum likelihood degree one. Therefore, by Corollary 
HI X has maximum likelihood degree one. □ 
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